FS3: A sampling based method for top-k frequent subgraph mining

نویسندگان

  • Tanay Kumar Saha
  • Mohammad Al Hasan
چکیده

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSIGRAM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, ...

متن کامل

Efficient Mining of Top-k Breaker Emerging Subgraph Patterns from Graph Datasets

This paper introduces a new type of discriminative subgraph pattern called breaker emerging subgraph pattern by introducing three constraints and two new concepts: base and breaker. A breaker emerging subgraph pattern consists of three subpatterns: a constrained emerging subgraph pattern, a set of bases and a set of breakers. An efficient approach is proposed for the discovery of top-k breaker ...

متن کامل

Pushing Constraints to Generate Top-K Closed Sequential Graph Patterns

In this paper, the problem of finding sequential patterns from graph databases is investigated. Two serious issues dealt in this paper are efficiency and effectiveness of mining algorithm. A huge volume of sequential patterns has been generated out of which most of them are uninteresting. The users have to go through a large number of patterns to find interesting results. In order to improve th...

متن کامل

Mining Frequent Graph Sequence Patterns Induced by Vertices

The mining of a complete set of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences (dynamic graphs or evolving graphs). In this paper, we define a novel class of subgraph subsequence called an “induced subgraph subsequence” to enable efficient mining of a complete set of freq...

متن کامل

Mining Top-K Frequent Closed Patterns without Minimum Support

In this paper, we propose a new mining task: mining top-k frequent closed patterns of length no less than min `, where k is the desired number of frequent closed patterns to be mined, and min ` is the minimal length of each pattern. An efficient algorithm, called TFP, is developed for mining such patterns without minimum support. Two methods, closed node count and descendant sum are proposed to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2014